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SUMARY 



The Baste and Intermediate English Language Tests (ELTs) are used to make decisions on hiring 
of foreign nationals at bases overseas. Occasionally, the tests are also used to determine bonus 
pay. The Basic and Intermediate ELTs were last revised In 1 965 and 1967, respectively, there- 
for e, the Air Force Civilian Personnel Center requested that the Air Force Human Resources 
Laboratory update both ELTs. 

The existing Basic ELT consisted of four tests; Writing, Listening, Reading, and Speaking. 
The existing Intermediate ELT consisted of three parts that essentially measured reading 
ability. Revision of these tests entailed making two forms each of the Basic and Intermediate 
ELTs, each of which would contain four tests (Writing, Listening, Reading, a nu Speaking) of 25 
Items each. 

This project was accomplished In three phases. Phase 1 administered a replacement Item pool 
to native English-speaking subjects. Air Force basic trainees were used In this phase; 34B were 
used for the Basic ELT Hem pool and 635 were given the Intermediate ELT item pool. Results 
showed the basic trainees missed very few Items, demonstrating that knowledge of English alone 
was sufficient In answering these Items. In Phase 2, these Item pools were pretested on samples 
of g? Oefense Language Institute foreign students. The Item pools were administered along with 
the existing tests to ensure a comparable level of difficulty between the old an<* new 
Instruments. Final Hem selection for the Basic ELT and selection of Hems for the Intermediate 
ELT field test Item P°o1 were based on the results of this phase. The last phase Involved field 
testing the Items nn foreign nationals presently working at bases overseas. These results 
confirmed that the revised Basic ELT could discriminate among lower English-language ability 
subjects and provided the basis for final Item selection for the Intermediate ELT. 

This project culminated in two forms each of the Basic and Intermediate ELTs. Each form 
contains four 25-polnt tests. The Basic and Intermediate ELTs were revised to be complementary 
Instruments, each containing a Writing, Listening, Reading, and Speaking test, with the 
Intermediate ELT being more difficult than the Basic ELT. 

In future research, It Is recommended that these tests be normed on job applicants. These 
norms might then be used to decide whether to administer the Basic ELT or to administer the 
Intermediate ELT. 
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REVISIOM OF THE BASIC AND IHTERMEOIATE 
ENGLISH LANGUAGE TESTS 



I. INTRODUCTION 

The English Language Tests (ELTs) are used to test foreign nationals seeking employment at 
bases overseas on their English-language proficiency. There are currently two versions of ELTs: 
the Basic and the Intermediate. 

The Basic ELf consists of a Speaking test and a Listening test (which are preceded by a 
Speaking and Listening Uarm-Up Exercise) and a Reading test and a Writing test (preceded by a 
Reading and Writing Warn- Up Exorcise). Testing times for each are as follows: Listening test ~ 
3.3 minutes, Speaking Test - 5 minutes, Reading Test - 3 minutes, and Writing Test - 5 minutes. 
Answers and dls tractors In the mult1ple*cho1ce Listening test ire presented In picture form. The 
stems 1n the other three tests are given In picture form. Each test has 20 Items. All four 
tests have two parallel forms. 

The Intermediate ELT has three sections. Part I measures vocabulary and contains 30 Hens. 
Part II measures grammar and Is made up of 27 flll-ln-the-blank Hems. Part III has 23 Hems 
that measure reading comprehension. All Hems are of the multiple-choice type. Testing times 
for each section are as follows: Part I - 15 minutes, Part II - 15 minutes* and Part III - 20 
minutes. As In the case of the Basic ELT, there are two parallel forms of the Intermediate ELT. 
Table 1 gives a description of the Basic and Intermediate ELTs. 

Table T. Construction of the Existing Basic And Intermediate ELTs 



Test Stem characteristics Response characteristics 



Basic Listening 


Spoken Sentence 


Four-picture, multiple- 
choice 


Basic Speaking 


Picture 


Free response 


Basic Reading 


Picture 


Four-word, multiple- 
choice 


Basic Writing 


Picture 


Supply missing word 


Intermediate Part I 


Underlined word In sentence 
Word Analogy 


Four-word, multiple- 
choice 


Intermediate Part II 


Hissing word 1n sentence 


Three-word, multiple- 
choice 


Intermediate Pert Hi 


Whole sentence-Sentence 
Analogy 


Four-sentence, multiple, 
choice 



Although the ELTs proved to be an effective screening device* several problems have become 
apparent. The last revision of the ELTs was made In 1967, and the currency of the tests 1s 
questionable. Also, due to the length of time the tests have been In the field, the Issue of 
compromise has been raised. Finally, there 1s a lack of documented validation of the ELTs. 



An attempt was made In this effort not only to update the ELTs but also to Improve them. 
They were Improved by measuring all facets of language ability. The use of a language has four 
components: listening* reading* speaking, and writing. Of the 10 tests of English-language 
proficiency described In Buros 097B), none appropriately tested all four components In adults, 



although a nuiber of studies have sought to suggest wajs to trprove language ability 
measurement. Hlsama 09?7b) defended the use of multiple measures In order to *vo1d 
mlsmeasurement In testing English as a second language. In order to Increase the effectiveness 
of a test that measures reading and listening, P1ke C 1 97 g ) developed criterion measures of 
speaking and writing ability to supplement the test. Lortbardo (1981) developed an assessment 
battery that Measured receptive language (reading and listening). She concluded receptive area 
tests were valid measures of language proficiency since they were Interrelated with expressive 
(writing and speaking) areas. The study went on to note, however, that the receptive area 
precedes the expressive area in the acquisition of language. From this finding, It seems 
receptive area tests are valid only with elementary-level examinees. 

"Banding* has been proposed as an effective method of determining the level of 
English-language proficiency. This Is a system where the level of proficiency Is divided into 
bands, ranging from beginner to native speaker. Corbett (19B0) stated that banding Is most 
useful when the specific purpose for which the language Is to be used can be specified. Good 
banding standards can be maintained by designing a variety of tests. This method Is similar to 
the ELTs In that there are both elementary (8as1c) and advanced (Intermediate) levels of the test. 

The CLOZE procedure has been extensively researched and has been found to be a reliable, 
valid, and practical measure of English-language proficiency. This Is a technique developed by 
Taylor (1953) where every nth word Is deleted from a paragraph. The examinee then supplies the 
missing word. Stubbs and Tucker (1974) validated a CLOZE test with an English proficiency 
entrance examination with excellent results. The CLOZE procedure was compared to several 
measures of English- language proficiency by Hlsama (1977a) and was found to be both reliable and 
valid. 

CLOZE tests have also been used In a mul tlple-chdce format. Scholz and Scholz (1981) found 
open-ended and wiUlple^rholce CLOZE tests appeared similar in their relationship to general 
English proficiency. Although multiple-choice tests have been criticized, they are a viable 
means of testing language proficiency. Schulz C 1 977) determined that objective, multiple-choice 
tests were more useful than simulated conversation tests as instructional aids for learning a 
foreign language. 

Speaking tests are the most difficult to administer and score of all the language proficiency 
tests. This Is due to the fact that they are somewhat subjective In nature. Subjectivity can be 
reduced by using the average of two judges 1 ratings, according to Mullen (1978). Many formats 
have been proposed to assess speaking ability. Some of these Include pictures to elicit speech, 
reading short sentences, and assigning a topic to elicit a sustained speech. 

The last point that needs to be considered In developing a language test Is how It should be 
administered. Many Instructions for English-language proficiency tests are given In English. 
The logic behind this Is that If a person knows enough English to take the test, that person 
should be able to understand the Instructions In English. Both the Basic and Intermediate ELTs 1 
directions are given In the native language. This will be continued for the revised ELTs. 
However, ftamos CI 981 ) showed that when instructions for a test w*re given In the native language 
of the person taking the test, significant gains !r« scores resulted. The effects of this on test 
validity for educational or job success criteria are not known. 

The Basic and Intermediate tests wert revised by first generating 120 Hems for each test. 
Second, the item pools were administered along with the existing tests to native English-speakers 
to ensure all ELT Items tested only English proficiency and not specialized knowledge or other 
extraneous factors. Next, pretesting with the ELTs occurred on a small group of foreign students 
to ensure that Items discriminated among ability levels of non-English-speakers. Finally, a 
field test was conducted on foreign employees for final item selection. 



II. TEST CORSTfcUCTIOR 



Basic English Language Test 

The Intent of the revision of the Basic ELT was to Increase the number of Items In each test 
from 20 to 2S but to allow the content to remain unchanged. This would allow easier test score 
Interpretations (total score of 10D Instead of 80), and H would Increase test reliability. 
Therefore, 120 new Hems were generated for each test that were similar in nature to those In the 
existing tests. 

The first step was to categorize the existing tests Into some meaningful context. The 
correct response to each Item was assigned a word frequency according to Carroll, Davles, and 
Rlchman (1971 ). These frequencies were categorized according to the three broad frequency 
categories established 1 / Urge and Thorndlke (1944). These categories were at least 100 
occurrences per million, at least 50 occurrences per million, and less than 50 occurrences per 
million. Rew Items were chosen for each test according to the same proportion of difficulty as 
appeared In the original versions of the Basic ELT. Lists of 120 new Items per test were then 
presented to the Aerospace Medical Division's Medical Illustration Section for graphic artwork. 

Next, dlstractor* were generated for the two muH1ple»cho1ce tests (Listening and Reading). 
Listening test dlstractors were derived by cross-cultural phonetic similar-ides (e.g., "chicken" 
[Spanish«pollo] distracting the word "pole"), by vowel contrasts {e.g., "ship* distracting 
"sheep"), and by grammar (e.g.* 'house dog* distracting "dog house"). Reading test dlstractors 
were created with spelling dlstractors (ft^g., "bazball" distracting "baseball 1 *) and 
similar-appearing English words (e.g., "army" distracting *ar«i"). No dlstractors were necessary 
for the Writing and Speaking tests by their nature. 



Intermediate English Language Test 

In contrast to the Basic ELT, a complete revision was necessary for the Intermediate ELT. A 
100 -point battery that was content-parallel to the Basic ELT was required. Although the existing 
Intermediate ELT contained three sections, It essentially measured only reading ability. The new 
Intermediate ELT was constructed to measure writing, listening, reading, and speaking abilities. 

According to Lado (1961), writing a language consists of knowing the language's rules for 
grammar, vocabulary, spelling, and punctuation. messing writing skill Is less a matter of 
sampling the act of physically writing words and sentences and more a matter of testing one's 
knowledge of a language's writing rules. Therefore, for the Writing Test, 120 multiple-choice 
Items were developed that were equally divided among testing rules for grammar, vocabulary, 
spelling, and punctuation. Dlstractors were chosen according to the rule being tested (e.g., 
grammar— went, gone; vocabulary— lake, sea, ocean; spelling— light, lite; and punctuation— J , ?). 

Listening test Items were constructed with an aural English lead sentence and four English 
sentences from which the test-taker must choose the most similar to the lead sentence In 
meaning. The leads were all free utterance which can appear Independently In conversations. 
Care was taken to avoid technical material and to limit the leads to only one sentence. These 
restrictions ensured that the content of the lead material was equally familiar [or unfamiliar) 
to all test-takers. Dlstractors were selected primarily to determine whether the test-takers 
understood the meaning of the leads. The dlstractors explored grammatical and/or syntactical 
structure (e.g., "bicycle between two cars* versus "car between two bicycles") and vocabulary 
{e.g., 'equal" versus "different"). One hundred twenty multiple-choice Items were developed. 
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The third multiple-choice test of the revised Intermediate ELT Is the Reading test* This 
test uses the CLOZE procedure described In the Introduction. The passages were taken from 
discarded Items of an Armed Services Vocational Aptitude Battery (ASVA6) updating effort* The 
ASVAB Is an aptitude test battery used by all of the Armed Services to select and classify 
enlisted personnel. According to the FORCAST method of determining reading grade level (RGL)» 
which was developed by Caylor* Stlcht* Fox* & Ford (1973) » these passages had a mean RGL of 
10*7B. Every seventh word was deleted from these passages. The only exceptions were the first 
and last sentences* which were left Intact to provide an understandable context for the passage. 
The 120 deleted words became the correct answers* Development of dlstractors varied according to 
the target answer. Verbs and adverbs generally tested past tense and plurals (e.g.* "Is/ "was," 
"are*" "were")* Noun dlstractors made sense In the sentence but not In the context of the 
passage* Adjectives tended toward opposltes (e»g. * "hot/ "cold") whereas combinations of 
dlstractors were used for conjunctions (e.g»» "and/ "or," "where"). Thus* no single set of 
rules was used to develop dlstractors* but they were selected according to how plausible they 
were In the context of the passage* 

The Speaking test was adapted rrom the paradigm advocated by Hull en (1976). in this test* 
two raters carry on a 15-mlnute conversation with the test-taker* After 15 minutes (In practice, 
10 minutes was found to be sufficient)* the two judges rate the Individual's vocabulary* 
pronunciation* fluency, grammar* and overall oral proficiency, based upon behavlorally anchored 
rating scales* An example of the rating sheet Is provided In Appendix A. Each scale ranges from 
Poor to Excellent; with Poor * 1, Marginal - 2* Fair - 3, Good - 4* and Excellent - 5* Thus, 
with five scales and a maximum of five points per scale* a total maximum score of 25 Is possible 
on the Speaking test. Twenty-five points was targeted to be the maximum score on each test* 
This would yield a 100-polnt battery* which would parallel the Basic ELT. 



HI* ITEM SELECTION METHOD 

The overall plan for Item selection and test validation called for three phases which 
Included administering the ELTs to native English-speakers* screening on a small group of foreign 
students, and field testing with foreign nationals already working at bases overseas. Trying out 
the revised Basic and Intermediate ELTs on English-speakers was necessary to detect any 
extraneous factors In them, such as testing memory* Intelligence* or technical matter* The 
rationale for screening the ELTs on a small group of foreign students prior to field testing was 
twofold* First* screening the ELTs provided evidence of whether the ELTs could discriminate 
among foreigners as well as do other current testing Instruments* Secondly* screening the ELTs 
allowed a reduced Item pool to be field tested. Final Item selection was based upon the results 
of the field test* 

As mentioned above* the first phase entailed administering the Basic and Intermediate ELTs to 
a native English-speaking group* It was first necessary to Identify a sample of "average" 
Engllsli-speakers* A random sample of Air Force basic trainees was selected for this purpose. 
For the Basic ELT* a sample of 34B trainees were used* of which 661 were high school graduates, 
76$ were males* and 661 were less than 21 years ol&» The Intermediate ELT sample was composed of 
63$ basic trainees* of which BOX were high school graduates* 76$ were males* and 74$ were less 
than 21 years of age* All 120 Items cn each subtest of the Basic ELT Item pool were administered 
to the former sample. The 120 Items In each subtest of the Intermediate ELT Item pool were 
administered to the latter sample* along with the existing Intermediate ELT, In a counterbalanced 
design. Any extraneous factors In the final Items were avoided by eliminating Items missed by 
more than 75$ of the basic trainees or Items that showed significantly positive dlstractor 
blserlals* 
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The next phase of this project pretested both the existing and revised Item pools on a group 
of foreign students. Arrangements were made with the Defense Language Institute (OLl) to utilize 
a sample of their students, who already had scores on the English Comprehension Level (ECL) 
examination. The ECL Is a test used by the Department of Defense to measure the English 
proficiency of foreigners who receive U. $. military training. These scores would be used as a 
measure of concurrent validity. The Basic ELT sample consisted of 99 students, of whom 9DX had 
12 years or more of education, 1001 were mile, and 721 were less than 28 years old. The 
Intermediate ELT sample contained 99 students, of whom 901 had 12 or more years of education, 991 
were male, and 541 were less than 28 years of age. The existing ELTs and replacement Item pools 
were administered to the samples In a counterbalanced design. The results were used to make the 
final Item selection for the Basic ELT and to reduce the Intermediate ELT item pool to 60 Items 
per subtest for field testing. 

The last phase of this project Involved field testing the ELTs with foreign nationals 
currently working at bases overseas. Because the format of the Basic ELT was essentially 
unchanged, the new tests were field tested only on 17 employees at Howard Air Force Base (AF8), 
Panama. This was done to ensure the Basic ELT could discriminate among foreign national 
employees. The major thrust of the field testing centered on the Intermediate ELTs. The Item 
pools were administered to 490 foreign national employees randomly selected at 16 bases 
overseas. The following nationalities were Included In the field test; German, Portuguese, 
Italian, Spanish, Turkish, Greek, Filipino, and Korean. Eighty-four percent of the sample had at 
least 9 years of education and were at least 25 years old; 441 were male. In addition to 
administering the Intermediate ELT Item pools, a supervisor's rating sheet was distributed to 
each subject's work supervisor, This supervisor's rating sheet gave a measure of the 
Intermediate ELT's validity. Appendix B shows an example of the rating sheet. 



IV. RESULTS 

When the results from pretesting the Basic ELT on basic trainees were analyzed, the mean 
score (on 120-ltem tens) were: for the Reading test, 115.41; for the Writing test, 114.91; and 
for the Listening test, 116.72. Scoring for the Speaking test Is on a nominal scale and, as 
expected, the ratings* mode was "no detectable accent.* Pretesting the Intermediate ELTs on 
basic trainees provided similar results. Mean scores on each test were; Reading test. 102.94; 
Writing test, 109.63; and Listening test, 115.13. Since only five Items (of 360) on the Basic 
ELT and only 30 Items (of 360) on the Intermediate ELT failed to reach the .75 difficulty level 
and none had significant positive dls tractor bl serials, all of the Items were presented to the 
0LI students In the next phase. These 35 unacceptable Items were subsequently eliminated. 

When the replacement Item pools were administered to the foreign students ct 0LI, lower 
scores were observed on all tests than were found with the basic trainees. Mean scores (of 120 
Items) on the Basic ELT were; Reading test, 78.99; Writing test, 79.75; and Listening test, 
94.71. Each Basic Speaking test Item's ratings were normally distributed. Final Items were 
selected by comparing the existing Speaking test Item distributions with those cf the replacement 
Item pool distributions. The criteria used for selection were similarity to the existing 
Speaking test Item difficulty level and the ability of the Item to discriminate (I.e., having a 
relatively normal distribution). Mean Intermediate ELT scores obtained by foreign nationals were 
also lower than thuse of the basic trainees: Reading test, 78.81; Writing test, 81.81; and 
Listening test, 87.27. As shown In Tables 2 and 3, the Item pools selected for field testing 
showed significant positive correlations with both forms of the existing ELTs and OLl's ECL 
examl nation. 
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Table 2. Baste ELT Correlations on OLI Students 
(N - 99) 







Field test Item pools (60 Items/test) 


Existing tests 


Items 


Reading 


Writing 


L1 stenlng 


Reading-Form A 


20 


.66 






Reading-Form B 


20 


.64 






Writing-Form A 


20 




.84 




Writing-Form B 


20 




.84 




Listening-Form A 


20 






.55 


Listening-Form B 


20 






.58 


ECL examination 




.12 


.79 


.74 


Kote. All correlations were 


significant at the .01 


1 eve) . 




Table 3. 


Intermediate ELT Correlations on OLI Students 








IN - 99) 










Field test Item pools (60 Items/test) 


Existing tests 


Items 


Reading 


Hrl tl ng 


Listening 


Part I Form A 


30 


.70 


.65 


.67 


Part I Form B 


30 


.64 


.5B 


.56 


Part II Form A 


21 


.55 


.45 


.53 


Part II Form B 


21 


.60 


.55 


.63 


Part HI Form A 


23 


.55 


.41 


.59 


Part III Form B 


23 


.56 


.52 


.57 


ECL examination 




.66 


.59 


.82 



Note. All correlations were significant at the .01 level. 



A comparison of difficulty levels was made between the Basic and Intermediate ELTs using the 
data obtained from the DLI students. Since all students were tested on the ECL examination, mean 
ELT scores were generated at various ECL score Intervals. For example, students who scored 
between 41 and 50 on the ECL had mean Basic ELT scores as follows: Listening - 40.75, Reading * 
36.50, and Writing - 26.00. In the same ECL score range, students' Intermediate ELT scores wore 
the following: Listening - £0.60, Reading - 31.40, and Writing - £5.00. Although these data 
should be viewed with caution due to the small sample cell sizes, It can be concluded that the 
Intermediate ELT Is more difficult than the Basic ELT. 

The third and final phase of this project was the field test. The Basic Reading test scores 
ranged from 13 to 49, with a mean of 33.59; the Writing test score mean was £3.16, with a range 
of 7 to 50; and the Listening test scores ranged from 17 to 48, with a mean of 34.29. These 
tests had a maximum score of 50. Results of the Speaking test revealed similar findings to those 
for the DLI sample: good discrimination between high and low ability. When the test 
reliabilities (Reading * .96, Writing = .96, and Listening = .92) obtained from the DLI sample 
were considered along with the range of scores obtained In the field test, the Basic ELT shoved 
that It could discriminate among Individuals In the Howard AF6 sample. 

As mentioned previously, the Intermediate ELT underwent a major revision. Therefore, the 
field testing was much more extensive for the Intermediate than the Basic ELT. The mean score 
for the Writing test score (of a possible 60) was 49.26, standard deviation was 9.37, and test 



reliability was .93. Mean Listening test score was 46.15, standard deviation was 12.92, and test 
reliability was .9$. The iean of the Reading test was 44.71, the standard deviation was 11.50, 
and test reliability was .94. Using the scoring method described In the Test Construction 
section, the mean Speaking test score was 20.24 (of a possible 25 points), with a standard 
deviation of 3.99, and an Interrater reliability correlation of .87. Intercorrelatlons of the 
four tests and the supervisors' ratings are shown In Table 4. These correlations reveal positive 
slgnlflrant relationships among the Intermediate ELT tests and the supervisors' ratings. 



Table 4 . Intercorrelatlons of the Parts of the Intermediate 
ELTs and Supervisors' Ratings (N - 490) 





Writing test 


Listening test 


Reading test 


Speaking test 


Listening test 


.80 








Reading test 


.84 


.86 






Speaking test 


.46 


.62 


.54 




Supervisors' 










Ratings 


.4) 


.49 


.47 


.57 



Hote. All correlations were significant at the .01 level. 



The final score for the Intermediate elt Is obtained by sunning the four test scores. Using 
the supervisors* ratings as a measure of validity, a correlation of .52 was found for this summed 
score. Tills Is lower than the .57 for the Speaking test and Is somewhat surprising. The cause 
for the drop In the validity coefficients Is likely due to the lower variance of the Speaking 
test In relation to the variances of the other three tests. If the Individual tests could be 
equally weighted In operation, higher validity would result. For example, by unit weighting the 
Writing, Listening, and Reading tests and applying a weight of 3 to trie Speaking test, the 
validity Is Increased to .55. 

Other than creating associated materials for the ELTs, such as administration manuals and 
scoring keys, the final task of this project was to separate the field test Item pools Into two 
operational versions. Information obtained from the DLI students In Phase 2 was used as a basis 
for separating the Items In the Basic ELT. Each Item's level of difficulty was matched with 
another's difficulty level to be placed In one of two alternate forms. Tills method resulted In 
the following mean levels of Item difficulty for each form on each test: Writing test * .59, 
Listening test = .69, Reading test - .$9, and Speaking test - 2.20. 

The rationale for assigning Items to Forms A and 6 of the Intermediate ELT was based on data 
from the Phase 3 field test. Only 50 out of 60 Hems per test were needed from the field test 
Item pool* The statistically least powerful Hems were discarded. That Is, Items with positive 
dls tractor blserlals or Items above the .92 level of difficulty were not selected to be Included 
In the final test forms. The remaining 50 Hems were then divided Into two forms of 25 Items 
each, based on their Item difficulties. The following were the mean levels of Item difficulty 
for both forms of each test: Writing test - .80, Listening test * .75, and Reading test - .75. 
Based on the field test sample, the correlations between the Individual test forms were .85 for 
the Writing test, .91 for the Listening test, and .85 for the Reading test. According to the 
Wherry and Gaylord (7943) estimate of reliability, the reliability for the composite of all 
subtests of the Intermediate ELT was .g$. Appendix C gives a summary of the statistics on the 
final versions of the Basic and Intermediate ELTs. 



V. RECOMMENDATIONS 



From the data generated by this effort, It is concluded that two equivalent forms of the 
Basic and Intermediate ELTs have been generated. Furthermore, based upon comparisons with the 
ECL and existing ELTs, the new ELTs measure a person's command of English as a second language. 
Therefore, It Is recommend*) that the new Basic and Intermediate ELTs be Implemented. 

Interpretations of test scores could be enhanced by future research. It was not feasible to 
collect data on a sample sufficiently large nor representative of al* worldwide applicants who 
normally take the Basic and Intermediate ELTs. These tests could be adequately normed by 
collecting test scores and demographic Information on individuals who apply for work at bases 
overseas and take the new ELTs. By doing this, separate norms could be established for each 
language group. Also, these data could be used as a basis to decide whether to administer the 
Bas;c ELTs or to administer the intermediate ELTs. This would be accomplished by establishing 
appropriate difficulty ranges for various ability levels. 
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APPEHOIX A: SPEAKING TEST RATING SHEET 




SPEAKING TEST RATING SHEET 



Name 



Employee If) Number 



After the person being rated has been dismissed, circle either excellent, 
good, fair, marginal, or poor on each of the five rating scales. 

Vocabulary 

Excellent - Uses a large number/variety of words correctly. 

Good - Only occasionally uses a word Incorrectly or has difficulty 

choosing a word. 
Fair - Often has difficulty choosing an appropriate word. 
Marginal - Great difficulty using words other than the most simple. 
Poor - is not able to express even a simple sentence. 

Pronunciation 

Excellent - Few, If any, traces of ac^iit. 

Good - Alway understandable, but definite accent. 

Fair - Heavy accent causes occasional misunderstandings. 

Marginal - Very heavy accent, repetition necessary to convey meaning. 

Poor - Accent causes speech to be barely understood. 

Fluency 

Excellent - Smooth and effortless speech. 

Good - Speaks readily with only occasional hesitation. 

Fair - Falters and hesitates often, pauses are frequent but usually short. 

Marginal - Usually hesitant speech, sometimes forced Into silence. 

Poor - Halting ano fragmentary speech, conversation virtually impossible. 

Grammar 

Excellent - Few, if any, grammar or word order problems. 

Good - Occasional grammar or word order problems. 

Fair - Errors often cause meaning of sentences to become obscured. 

Marginal - Great difficulty using correct grammar or word order, frequently 

uses Incorrect verb tense, nouns, adjectives, etc. 
Poor - Speaking can't be understood due to grammar errors. 

Overall Oral Proficiency - Basing your decision on all of the above criteria, 
rate the examinee on his or her overall command of the English language. 

Excellent 

Good 

Fair 

Marginal 

Poor 
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APPEMOIX B: SUPERVISOR'S RATING SHEET 
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SUPERVISOR'S RATING SHEET 



First, print the employee's name and Identifying number In the spaces 
provided. Then, as objectively as you can, rate the employee using the 
following eight scales. Simply circle either excellent, good, fair, marginal, 
poor, or not observed on each of the scales. Please rate the Individual on 
all rating scales. 



Name Employee ID Number 



1. Vocabulary 

Excellent - Uses a large number/variety of words correctly. 

Good - Only occasionally uses a word Incorrectly or has difficulty 

choosing a word. 
Fair - Often has difficulty choosing an appropriate word. 
Marginal - Great difficulty using words other than the most simple. 
Poor - Is not able to express even a simple sentence. 
Not observed 

2. Punctuation and Spelling 

Excellent - writing has virtually no punctuation or spelling errors. 
Good - Makes occasional punctuation or spelling errors. 
Fair - Frequent errors cause writing to be difficult to read. 
Marginal - Many errors cause writing to be very difficult to read. 
Poor - Extreme number of errors cause writing to be misunderstood. 
Not observed 

3. Grammar 

Excellent - Few, If any, grammar or word order problems. 

Good - Occasional grammar or word order problems. 

Fair - Errors often cause meaning of sentences to become obscured. 

Marginal - Great difficulty using correct grammar or word order, frequently 

uses Incorrect verb tense, nouns, adjectives, etc. 
Poor - Writing and speaking can't be understood due to grammar errors. 
Not observed 

4. Fl uency 

Excellent - Smooth and effortless speech. 

Good - Speaks readily with only occasional hesitation. 

Fair - Falters and hesitates often, pauses are frequent but usually short. 

Marginal - Usually hesitant speech, sometimes forced Into silence. 

Poor - Halting and fragmentary speech, conversation virtually impossible. 

Not observed 
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5. Pronunciation 



Excellent - Few, if any, traces of accent. 

Good - Always understandable, but definite accent. 

Fair - Heavy accent causes occasional misunderstanding. 

ferginal - Very heavy accent, repetition necessary to convey meaning. 

Poor - Accent causes speech to be barely understood. 

Not observed 

6. Reading Comprehension 

Excellent - Can read virtually any English word. 

Good - Has some difficulty recognizing some English words. 

Fair - Does not recognize many English words. 

ferginal * Can read only simple English words. 

Poor - Cannot understand most English words. 

Not observed 

7. Listening Comprehension 

Excellent - Can understand oral instructions with no misunderstandings. 
Good - Sometimes needs oral instructions repeated to understand what Is 
being said. 

Fair - Often misinterprets oral instructions, several repetitions some* 
times necessary. 

Marginal - Can only understand simple oral instructions, errors often occur. 
Poor - Seldom understands oral Instructions. 
Not observed 

8. Ability to perform j ob based on English proficiency 

Excellent - Use of English does not Impair job Performance. 
Good - English usage slightly affects employee's job performance. 
Fair - Job performance is frequently hindered by use of English. 
Marginal - Use of English severely affects Job performance. 
Poor - Lack of English skills causes job performance to be accomplished 
incorrectly most of the time. 

Not observed 
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APPENDIX C: REVISED ELTs' DESCRIPTIONS 
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Table C-l. Basic ELTs Statistics 



Test 


Items 


Mean difficulty 


Reliability 


A-B correlation 


Writing 


25 


14.75 


.9B 


.95 


LI stenl ng 


25 


17.25 


.92 


.89 


Reading 


25 


17.25 


.9B 


.94 


Speaking 


25 


13.75 


.91 


.35 



Note. These data are based upon the DLl sample (N * 99). 



Table C-2. Intermediate ELTs Statistics 



Test 


Items 


Kean difficulty 


Reliability 


A-B correlation 


Writing 


25 


2D. 07 


.93 


.35 


Listening 


25 


IB. 3D 


.96 


.91 


Reading 


25 


13.68 


.94 


.85 


Speaking 


N/A 


2D.2B 


.87 


N/A 



Mote. These data are based upon the overseas fle'td test sample (H = 489). 
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